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ABSTRACT 

Evaluating the previous work is an important part of developing super-resolution methods for better image 
reconstruction to improve perceptual quality with low computational complexity, image reconstruction has been very 
attractive research topic over last two decades as image super-resolution has a demanding applications in many of the real 
world applications such as, from satellite and aerial imaging to medical image processing, to facial image analysis, text 
image analysis, signature and number plate reading and biometric recognition. The aim of this paper is to review some 
spatial domain image super-resolution techniques as well as different challenge issues. This study is useful for 
accomplishing two goals (1) Reconstructing images without losing perceptual quality, (2) Designing new algorithms, the 
main objective. 

KEYWORDS: Super-Resolution, Interpolation, Maximum A Posterior (MAP), Markov Random Field (MRF), 
Hallucination 

I. INTRODUCTION 

Image resolution describes the detail contained in an image. Super-resolution is the problem of generating a high 
resolution image from one or more low resolution images. When several low-resolution images are utilized to enhance the 
resolution, it is multi-image super-resolution. Multi-image SR basically works on the principle of combining nonredundant 
information contained in multiple LR frames to generate HR image. Whereas when HR image is obtained using single LR 
image, it is referred to as single image SR which can also be used to increase image size. In single image interpolation 
since, there is no additional information provided, the quality of this approach is very much limited due to ill posed nature 
of the problem as the lost frequency components cannot be recovered. However multiple-resolution observations are 
available for reconstruction making the problem better constrained in a manner that nonredundant information contained in 
these LR images by aligning them in a subpixel accuracy. 

Many techniques have been proposed over last two decades representing approaches from frequency domain to 
spatial domain and from signal processing perspective to machine learning perspective. Many researchers worked on 
theory by exploring shift and aliasing properties of Fourier transformation. However, these observations are very restricted 
in the image observation model they can handle. Also, the real problems are much more crucial, researchers nowadays 
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most commonly address the problem mainly in spatial domain, so that it can be flexible to model all kinds of image 
degradation. 

Although several articles have surveyed the different classical SR algorithms to extract HR images corrupted by 
the limitations of optical imaging system (Parulski et al,1992) such as finite aperture size, which causes optical blur 
modelled by point spread function, finite aperture time which results in motion blur, finite sensor size which results in 
sensor blur, the intension of this article is to pinpoint the various difficulties inherent to the SR problem as well as to 
enhance the image from LR frame which is degraded by downsampling and compression scenario. However images are 
degraded in order to reduce storage space and transmission bandwidth. Although this survey does not covers all the spatial 
domain SR analysis in detail, but provides comprehensive review of most of the SR works for each of the basic methods 
and, evolving path of the basic method have been discussed by providing the modification that have been applied to the 
basics by different researchers. The best thing about this paper is, this review is not only beneficial for the beginners in the 
field to understand the available methods, but also useful for the experts in the field to find out current status of their 
desired methods. 

II. SPATIAL DOMAIN SR TECHNIQUES 

In this section, we explore three specific SR scenarios. Many different researchers have optimized each specific 
SR scenario to better penalize trade off between preserving high frequency details and its computational complexity. Each 
of which technique addresses a particular aspect of the general SR challenge. It is our hope that this work provides the 
foundation for future work addressing more complete SR problem. 

1. INTERPOLATION-RESTORATION 

Basically, there are two criterions to evaluate the performance of an image interpolator: perceptual quality and 
computational complexity. Conventional nearest neighbour, bilinear and bicubic operators are simple with fast 
implementation, but they often introduce annoying "jaggy" artifacts around the edges because local directional features in 
images are not taken into consideration. Many algorithms [1]-[12] have been proposed to improve the subjective quality of 
the interpolated images by imposing more accurate models. Filtered back projection methods [1] were among the first 
methods developed for spatial based SR. Adaptive interpolation techniques [2]-[4] spatially adapt the interpolation 
coefficients to better match the local structures around the edges. Iterative methods such as PDE-based schemes [5], [6] 
and projection onto convex sets (POCS) schemes [7], [8], constrain the edge continuity and find the appropriate solution 
through iterations. Edge-directed interpolation techniques [9], [10] employ a source model that emphasizes the visual 
integrity of the detected edges and modify the interpolation to fit the source model. Other approaches [11], [12] borrow the 
techniques from vector quantization (VQ) and morphological filtering to facilitate the induction of high-resolution images. 
Among all the interpolation reconstruction, adaptive directional interpolation algorithm [13 ] is more competitive efficient 
compared with other methods, and it can be easily extended to any integer magnification ratios. Construction of Adaptive 
Directional Interpolator is threefold. First, gradients from LR image are diffused to the desired HR to determine the edge 
orientations at missing pixels in the magnified image. Second, linear interpolation with position fixed supports (i.e. 
involved known pixels) and gradient adaptive weights. Third, the continuities between original and interpolated pixels are 
enforced by difference projection, which simply reuses directional interpolator. 
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However, this approach does not include any additional information for compensating the high-frequency content 
of the HR images to be constructed, which has been lost in the low-resolution (LR) images. A number of super-resolution 
algorithms have employed regularization terms to solve the ill-posed image up-sampling problem. These algorithms 
usually incorporate smoothness priors as a constraint in reconstructing the HR images. However, using smoothness priors 
that are defined artificially has been found to lead to overly smoothed results. Moreover, the interpolation based approaches 
need special treatment of limited observations in order to reduce aliasing without HR image prior as proper regularization. 

2. STATISTICAL IMAGE RESTORATION 

There are some statistical approaches [14]-[30] in which SR reconstruction is related stochastically towards 
optical reconstruction. The HR image and motions among LR inputs can be both regarded as stochastic variables. These 
methods for enhancing image sequences using motion information needs motion to be estimated by various motion 
estimation methods. After motion computation, assuming prior distribution of degradation matrix, this knowledge is 
integrated into a Bayesian SR framework or uses the degradation bounds to determine convex sets which constrain the SR 
problem. SR reconstruction can be cast into full Bayesian framework 

f Pr (¥|5!;,MCV.JJ33 Pr (5!;,M(yJi33 , 

Y ""^^ ^ '^'^ n ^ 

X = arg " uy (1) 

Here, M(v,h) is the degradation matrix defined by motion vector v and blurring kernel h, H(v, h]] ^.j^^ 

data likelihood, Pr(X) is the prior term on the desired HR image and Pr(M(v,h)) is the prior term on the motion estimation. 
Assuming M to be known, the Bayesian formulation in Eqn. 1 is further simplified to form popular MAP for SR, 

= arg II - MX| I* + AA(X)} 

Where absorbs the variance of noise. If uniform prior over X is assumed Eqn. 2 is reduced to simplest maximum 
hkelihood estimator. The ML estimator relies on the observations only, seeking the most likely solutions for the 

observations to take place by maximising 

Statistical image reconstruction methods has shown potential to improve image resolution as compared to 
conventional filtered back projection method. According to the MAP estimation, the SIR methods can be typically 
formulated by an objective function consisting of two terms: (1) Data fidelity (or equivalently, data- fitting or data 
mismatch term) modelling the statistics of projection measurements, and (2) Regularisation term reflecting prior 
knowledge or, expectations on the characteristics of the image to be reconstructed. 

2.1 Maximum a Posterior (Map) Estimation 

Mathematically, reconstruction of low resolution images degraded by down sampling and compression is an ill- 
posed problem due to the presence of quantization noise and other inconsistencies in the projection data. Therefore, the 
image estimation that directly optimizes the (ML) maximum likelihood criterion can be very unstable and noisy. This 
problem is reformulated by researchers with the MAP estimation by posing a prior term to penalize or regularize the 
solution. The prior term enables us to incorporate available information or expected properties of the image to be 
reconstructed. The MAP estimator mathematically can be written as: 
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H* = arg n ■ / (3) 

According to the Bayesian law; 

P(|j|N) = ^^-^ (4) 

By taking the logarithm and omitting the irrelevant term, the MAP estimator can be simplified to: 

^ maK^r ]liiP(ii|N}] maKjt(tfliV>] tfiaxJK t>/ m /<n 

H* = arg ''^ ^ -arg ^ = arg ^|alN)+ln P(|a)] (5) 

It can be simplified as: 

H* = arg ^ =arg r ^ ^g-j 

where U denotes a penalty, and P > 0 is a scalar control parameter which allows one to tune the MAP 
(or penalized ML) estimation for a specific noise-resolution tradeoff. When yS goes to zero, the reconstructed image from 
the MAP estimation approaches the ML estimation. 

The MAP estimation from the Tikhonov regularization point of view, can be considered as an objective function 
consisting of two terms: a data-fidelity term (e.g, the log-likelihood) modeling the statistics of projection measurements, 
and a regularization term (e.g, the log-prior) incorporating prior knowledge or expected properties of the image to be 

reconstructed. The resulting objective functions in Eqn. (6) would be concave if and only if ^ (|i) is a convex function of [i. 
Hardie et al. [24] proposed a joint MAP framework for simultaneous estimation of the high resolution image and motion 
parameters with Gaussian MRF prior for the HR image. Bishop et. al [25] proposed a simple Gaussian process prior where 
the covariance matrix Q is constructed by spatial correlations of the image pixels. The Gaussian process priors due to its 
good analytical property allows a Bayesian treatment of the SR reconstruction problem, where the unknown high 
resolution image is integrated out for robust estimation of the observation model parameters (unknown PSFs and 
registration parameters). Although the GMRF prior has many analytical advantages, a common criticism for it associated 
with super-resolution reconstruction is that the results tend to be overly smooth, penalizing sharp edges that we desire to 
recover. Again to encourage piecewise smoothness and to well preserve edges, image gradients are modelled with a 
distribution with heavier tails than Gaussian, leading to the popular Huber MRF where the potentials are determined by 
Huber function. 



(7) 



Here, a is the first derivative of the image. Schultz and Stevenson [14] applied this Huber MRF to single image 
expansion problem, and later to the SR reconstruction problem in [22]. Many later works on SR employed the Huber MRF 
as the regularization prior, such as [17]-[23]. Total variation norm (TV) has a gradient penalty function is also very popular 
in image denoising and deblurring literature[26,27,28]. Total amount of change in the image as measured by the £i norm of 
the magnitude is efficiently regularized. Magnitude gradient is given by 

F(X) = ll^^lli (8) 
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Mainly, statistical image reconstruction techniques are adopted for clinical use to compute X ray CT and also in 
Emission tomography modalities. Statistical techniques have several attractive features [29]-[30]. The data noise is 
statistically modelled by offering the potential for better bias-variance performance. Statistical methods also easily 
incorporate the system geometry, detector response, object constraints and any prior knowledge. They can also model such 
phenomena as scatter and energy dependence leading to more accurate and noise-free reconstruction. Mostly they are 
suited for arbitrary geometries and situations with truncated data. Their main drawback is longer computation times. 
For clinical CT images with typical sizes of 512 ^512 pixels or larger, conventional statistical methods require 
prohibitively long computation times which hinder their use. 



Different from previous approaches, learning based methods rely on super-resolving LR image using single LR 
image. Unlike previous methods where priors are in parametric form regularizing on the whole image, the example based 
methods develop the prior by sampling from other images similar to [31], [32] in a local way. One family of this class 
approaches [33] to exploit neighbourhood relationships in SR algorithm by Markov network to probabilistically model the 
relationships between high- and low-resolution patches, and between neighbouring high resolution patches. It uses an 
iterative algorithm, which usually converges quickly. Such approaches usually work by maintaining two sets of training 

patches, {xji =1 sampled from the high resolution images, and {yi}i=l sampled from the low resolution images 
correspondingly. Each patch pair (Xj yi) is connected by the observation model yi =DHxiH-V. This high- and low-resolution 
co-occurrence model is then applied to the target image for predicting HR image in a patch based fashion. 

3.1 Markov Model 

Spatial relationships between patches is modelled using a Markov network, which has many well-known uses in 
image processing. In Figure 1, the circles represent network nodes, and the hnes indicate statistical dependencies between 
nodes. We let the low-resolution image patches be observation nodes, y. We select the 16 or so closest examples to each 
input patch as the different states of the hidden nodes, x. that we seek to estimate. For this network, the probability of any 
given high-resolution patch choice for each node is proportional to the product of all sets of compatibiUty matrices relating 
the possible states of each pair of neighboring hidden nodes, and vectors relating each observation to the underlying hidden 
states: 



Z is a normalization constant, and the first product is over all neighboring pairs, i and j. yi and xi are the observed 
low-resolution and estimated high-resolution patches at node i, respectively. To specify ^'yfx,, y,) function, the sum of 
squared differences of the patch candidate x, and jc, , dij(xi, Xj) is measured in their overlap region / and / Assuming as a 
noise parameter, compatibility matrix between nodes / and j is. 



3. LEARNING BASED RESTORATION 




(9) 



dSXi'Xj 



1 



(x„ Xj) = exp ( 
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(10) 
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With a Markov random field(MRF) model as shown in Figure 1 . The observation model parameters are assumed 
to be known as a prior, also tight coupling of target image with training sets is needed. Optimum patch size should be 
selected. If patch size is too small, the co-occurrence prior becomes too weak to make meaningful prediction. While if 
patch size is very large, a huge training set is needed to find proximity patches for current observation. 



Lo-res. patches 




Figure 1: Markov Network Model for the SR Problem. The LR Patches at Each Node Yi are the Observed 
Input. The HR Patch at Each Node Xiis the Quantity We Want to Estimate 

Baker and Kanade [34, 35], have demonstrated that reconstruction constraint used in many regularization based 
methods provides less useful information as the zooming factor increases. They proposed a "Hallucination Algorithm" to 
break the limit of reconstruction constraint. To estimate the high frequency components for the HR image, a multiscale 
feature vector from a training set, which is composed of both LR details and its corresponding HR details, is searched as 
the best match, based on the LR patches from a LR image and the LR pixel values of the feature vector. Most example 
based SR algorithms [36]-[40] also involve a training set, which is usually composed of a large number of HR patches and 
their corresponding LR patches. The input LR image is segmented so as to create patches which could be either 
overlapping or no-overlapping. Then, for each LR patch from the test image, either one best-matched patch or a set of the 
best-matched LR patches is selected from the training set. The corresponding HR patches are used to reconstruct the output 
HR image. Freeman et al. [36, 37] embedded two matching conditions into a Markov network. One is that the LR patch 
from the training set should be similar to the input observed patch, while the other condition is that the contents of the 
corresponding HR patch should be consistent with its neighbours. Wang et al. [38] extended the Markov network to handle 
the estimation of PSF parameters. Stephenson and Chen [39] presented a method in which the symmetry of a cropped 
human face is considered in the Markov network. Qiu [40] proposed an alternative method, based on vector quantization, 
to organize example patches. A survey of example-based super-resolution methods is available in [41]. 

However, most of these existing algorithms involve only a kind of "searching and pasting" approach, and are 
therefore computationally intensive when searching for a LR-HR patch from a huge training set. Furthermore, best- 
matched but incorrect patches will seriously degrade the reconstruction results. To deal with these problems, usually the 
algorithms simply adopt the average of a set of the "best-matched" patches; the averaged high-frequency component is 
then pasted into the magnified image. For example, Qiu [40] employed the ' 'classifying and averaging" scheme. However, 
the averaging will result in over-smoothing in the output HR image. Li [41] employed class-specific predictors so as to 
have efficient learning from separate class-specific database. The prior term learned discussed above imposes the local 
constraints of the super -resolved image, while for imposing global constraints, some SR algorithms have used projection 
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based methods for learning the a priori term of the employed MAP algorithm, a Kernel-PCA based prior that is a 
non-linear extension of the common PCA was embedded in a MAP method to take into account more complex correlations 
of human face images. In PCA based methods, usually the matrices representing each training image are first vectorized 
(by arranging, e.g, all the columns of each matrix in only one column) and then they are combined into a large matrix to 
obtain the covariance matrix of the training data for modeling the eigenspace. It is discussed in [42] that such vectorization 
of training images may not fully retain their spatial structure information. Instead, it is suggested to apply such a 
vectorization to the features extracted from training data and use them for SR. 

3.1 How to Design and Generate Training Database? 




(a) 



(b) 



Figure 2: (a) Generation of the HR Difference Image Lo and tlie LR Difference Image Lj for tlie Construction of 
HR-LR Patclies, and (b) A 4*4 HR Block in Lo and its Corresponding LR Block in Lj 

The training set chosen for exercise is essential to the realization of the learning-based super-resolution methods. 
Each record in the training set is an example patch-pair, viz. a HR image block and the corresponding LR block. Similar to 
the method proposed by Qiu [40], a multi-resolution representation of an input image is formed using a three-level 
Laplacian pyramid. As shown in Figure 2 (a), let Iq represent a HR example image, which is blurred and down-sampled to 
produce /; by a zooming factor z- similarly, I2 is generated from Ij using same zooming factor z- The up sampled images 
from Ij and I2 are generated using bilinear interpolation with a factor z, and are then subtracted from Ig and Ij, respectively 
to compute difference images Lg and Lj. The example patch pairs are then extracted from Lg and Lj, which are then used to 
train up the predictors. 

Fi gure 2(b) shows a HR difference image L]. If z=2, each 4*4 HR block in Lq, e.g. the grey block has a 
corresponding 2*2 LR block in Lj.. viz. the black block. In order to maintain the continuity of a HR block with its 
neighbors, we extend the boundary of the corresponding LR block by 1 pixel to form a LR sampling block, i.e. the LR 
block in black and the neighboring pixels in grey in Lj, as shown in Figure 2(b) This HR block and the corresponding LR 
sampling block thus form a patch-pair. By considering all the possible HR blocks in Lj. and the corresponding 



LR sampling blocks a training set in the patch pairs is generated. Example based algorithm works best when data's 
resolution or noise degradation match those of the images to which it is applied. 
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4. SR WITH COMBINED METHODS 

In order to better solve SR problem, some researchers combined the above algorithms resulting in new groups of 
algorithms. Examples of such category can be found in: [47] and [48]: where ML and POCS are combined, which allows 
incorporating non-linear a priori knowledge into the process. 

• [49 J and [50]: MAP and POCS are combined and apphed to compressed video. 

• [51]: MAP and FBP are combined. 

• [52] and [53]: where reconstruction based SR is followed by learning based SR. 

III. CHALLENGE ISSUES 

In this article, we presented only a few methods and insights for specific scenarios of Super-Resolution. Many 
questions still persist in developing a generic Super-Resolution algorithm capable of producing high-quality results on 
general image sequences. Furthermore, analysis of this sort could possibly provide understanding of the fundamental limits 
to the Super-Resolution imaging, thereby helping practitioners to find the correct balance between expensive optical 
imaging system and image reconstruction algorithms. Such analysis may also be phrased as general guidelines when 
developing practical super-Resolution systems. 

In building a practical Super-Resolution system, many important challenges lay ahead. For instance, in many of 
the optimization routines used in this and other articles, the task of tuning the necessary parameters is often left up to the 
user. Parameters such as regularization weighting X can play an important role in the performance of the Super-Resolution 
algorithms. Although the crossvalidation method can be used to determine the parameter values for the nonrobust Super- 
Resolution method (Nguyen et al, 2001a), a computationally efficient way of implementing such method for the robust 
Super-Resolution case has not yet been addressed. Although some work has addressed the joint task of motion estimation 
and Super-Resolution (Hardie et al, 1997; Schultz et al,1998; Tom and Katsaggelos, 2001), the problems related to this still 
remain largely open. Another open challenge is that of blind super-Resolution wherein the unknown parameters of the 
imaging system's PSF must be estimated from the measured data. Many single-frame blind deconvolution algorithms have 
been suggested in the last 30 years (Kondur and Hatzinakos, 1996), and recently (Nguyen et al, 2001a) incorporated a 
single parameter blur identification algorithm in their Super-Resolution method, but there remains a need for more research 
to provide a Super-Resolution method along with a more general blur estimation algorithm from ahased images. Also, 
recentiy the challenge of simultaneous resolution enhancement in time as well as space has received growing attention 
(Robertson and Stevenson 2001; Shechtman et al, 2002). 

Adding features such as robustness, memory and computation efficiency, color consideration, and automatic 
selection of parameters in super-Resolution methods will be the ultimate goal for the Super-Resolution researchers and 
practitioners in the future. 

IV. CONCLUSIONS 

This paper reviews most of the papers on Spatial domain Super-Resolution and also proposes broad anatomy for 
the same. Supplementary to giving details of most of the techniques, it notifies the pros and cons of the method when they 
have been available in the reviewed paper. Furthermore, it highlights the most common challenge issues encountered while 
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dealing with them. 

This overview has come to its tail end, but one can still not answer to the very important question, what are the 
state-of-the-art Super-Resolution algorithms. Truly speaking, the answer is highly dependent on the application. The SR 
algorithm which is good for clinical imaging is not necessarily good for aerial images or facial image processing. 
In different algorithms, different algorithms are leading. That's why there still are many recent publications for almost all 
types of the surveyed algorithms. This is mainly due to the different constraints that are imposed on the problem in 
different applications. Therefore, it seems difficult to compare super-resolution algorithms for different applications against 
each other. Generating the touchstone database for learning examples seems quite challenging. Generally speaking, 
comparing frequency domain algorithms with spatial domain, former are very interesting from the theoretical point of view 
but faces many problems when applied to real world scenarios e.g, spatial domain methods have better evolved with proper 
modeling of the motion in real world applications. Frequency domain methods mostly lacks in this. That's why there still 
are many recent publications for almost all types of the surveyed algorithms. This is mainly due to the different constraints 
that are imposed on the problem in different applications. Therefore,it seems difficult to compare super-resolution 
algorithms for different applications against each other. Among these methods, the single-image based methods are more 
application-dependent while the multiple image based methods have been applied to more general applications. The 
multiple-image based methods are generally composed of two different steps: motion estimation, then fusion. These had 
and still have limited success because of their lack of robustness to motion error (not motion modeling). Most recently, 
implicit nonparametric methods have been developed that remove this sensitivity, and while they are slow and cannot 
produce huge improvement factors, they fail very gracefully and produce quite stable results at modest improvement 
factors. 
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